Entry Name:  “PKU360-Ye-MC2”

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Tangzhi Ye, Peking University, yetangzhi66@gmail.com PRIMARY

Youfeng Hao, Peking University, ajihyf@gmail.com

Zenghuang Wang, Peking University, wangzhenhuang.zeek@gmail.com

Chufan Lai,Peking University, chufan.lai.1990@gmail.com

Siming Chen, Peking University, simingchen3@gmail.com

Xiaoru Yuan, Pkeing University, xiaoru.yuan@gmail.com

Jie Liang, Peking University, christy.jie@gmail.com

Zongru Li, Peking University, 1300013035@pku.edu.cn

Zhuo Zhang, Qihoo 360 Technology Co. Ltd, zhangzhuo@360.cn

Xin Huang, Qihoo 360 Technology Co. Ltd, huangxin-xy@360.cn

Zhanyi Wang, Qihoo 360 Technology Co. Ltd, wangzhanyi@360.cn

Chuanming Huang, Qihoo 360 Technology Co. Ltd, huangchuanming@360.cn

 

Student Team:  NO

 

Did you use data from both mini-challenges?  YES

 

Analytic Tools Used:

PKU Space-Time explorer developed by PKUVIS

Python pandas. http://pandas.pydata.org/

 

 

Approximately how many hours were spent working on this submission in total?

200 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

 

Video Download

Video:

http://vis.pku.edu.cn/vast2015/VC2015MC2.mp4

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.        Characterize the communication patterns you see.

      b.        Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

Response:

 

说明: 幕快照 2015-07-08 下午5.39.06.png

 

 

We found 3 ids that stand out for their large volumes of communication.

 

 1)    839736

    Patterns: This ID sents to everyone and had high volume communication in the afternoon, around 12pm and 15:25pm

    With whom: Everyone

    Place: Entry corridor

    Hypothesis: It may be a ID which boardcast announcement

说明: https://lh6.googleusercontent.com/Lg_CjOh-iiYd7h9GCy2RNBcRo3rC2eoF7R6Jm90oQ66WaFKP9mnKNaZrMZ0wyb1_5i1vBgSer6aff2FRbJjHSfA_NOX89kMsoVUPIGrnsA4zChE0MFQWXe5bIEdAhlMIZLIM9Ss

 

       

 

 

2) 1278894:

    Patterns:

    Frequency: Evenly every 5mins in one hour and stop for next hour, between 11am to 9pm, the ID broadcasts to others only at certain timestamp.(For example, the ID broadcasts at 11:00:00 a.m. 11:05:00 a.m. 11:10:00 a.m. ...)

For people communicate with this ID, they reply the messages often within the 5 minutes. The sending and received message numbers are almost same. For example, this ID send 60 messages to a visitor, the visitor often send 60 messages to this ID on a day.

    With who: Large amount of visitors

 Place: from Entry corridor, to all places in the park

    Hypothesis: It might be broadcasting which announced the schedule for shows, events or activities in the park.

 

说明: https://lh3.googleusercontent.com/CETZIH1IOsw7qbMI7_LlE6IG9y56byYjpHE5v50Ntnq-psJecgoe8rPgsSYn6ysJmkhcB-Ieo541-UK7sWHtqDJdB4CBmR1zMkVsE5qOMuWZTCj8s8knY1avCiL3BOv9CZWWdB0

 

 

 

3) We also found the external received a lot of message in the three days.  However, they may be a lot individual visitors.

 

 

 

 

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

Limit your response to no more than 10 images and 1000 words.

 

Response:

 

First, we extract the top ids’ patterns:

Pattern1: The ID possess the largest volumes of communication and the communication is periodical

IDs: 1278894

User: Timing reporter

Place: Entry Corridor

To: All visitors

 

说明: https://lh3.googleusercontent.com/CETZIH1IOsw7qbMI7_LlE6IG9y56byYjpHE5v50Ntnq-psJecgoe8rPgsSYn6ysJmkhcB-Ieo541-UK7sWHtqDJdB4CBmR1zMkVsE5qOMuWZTCj8s8knY1avCiL3BOv9CZWWdB0

 

Pattern2: The ID possess the second largest volumes of communication and increase suddenly at specific time point

ID: 839736

User: Alarm

Place: Entry Corridor

To: All visitors

 

说明: https://lh6.googleusercontent.com/Lg_CjOh-iiYd7h9GCy2RNBcRo3rC2eoF7R6Jm90oQ66WaFKP9mnKNaZrMZ0wyb1_5i1vBgSer6aff2FRbJjHSfA_NOX89kMsoVUPIGrnsA4zChE0MFQWXe5bIEdAhlMIZLIM9Ss

 

Filtering out the top ids, there are some cliques showing macroscopic patterns as in the following figure.

说明: https://lh6.googleusercontent.com/AmbnJKDcpbBGdkkQAtre2F_hdW9f7tdZhGqcheYeKmemaml8qwYnUT_LUErhdX7r-xRgoh8ipNFIFRSmLvc5F3lVHHfcdd8VWw8L0R27gWcEqkp9j6PH50Od15rZ9mXgay6jpYo

 

 

Pattern3: Single person who did not communicate with other groups.

 

Pattern4: Groups containing more than one people with only internal communications. Comparing with the trajectory data in MC1, people in these groups usually share same trajectory as the image below.

说明: Q20150709-1@2x.png

 

Looking into the large cluster in the figure of Pattern 2, in fact, it can be divided into groups with the trajectory data in MC1. These groups have different patterns of internal and external communications.

 

Pattern 5: Inside these groups, some people only had internal communications, while some other people also had large external communications. Combining the data in MC1, we find that many groups with this pattern have more check-in records in Kiddle Rides(represented as green in the following figure) than other groups. We infer that these groups contain parents who went to Dino Fun World with their children.

 

 

说明: Q20150709-2@2x.png

 

 

Pattern 6: The white dots represent a person of the same group. The red lines among them stand for communications. We find most group members are consistent in the whole process. At first, they assemble near the entrance. Then they visit different parts respectively. Finally, they leave at the same time. Also, they will call the partners far from the entrance.

 

说明: https://lh5.googleusercontent.com/X0Oqi_rYfH6kwNP2db-Xc8S7ydp1y49_HEGbokkdM16XwKTOOqWtmqH4WF8oZ7J3QWEoluk0XEmdKS0EiKgrLtPOieub-ZnkJwxPueWbWsFJAbljnSLuvW7ZJqnWhrzsrArJvtM说明: https://lh6.googleusercontent.com/x3TO6rnVlGnZF2htH9kbMO_knLI0Wch89mcgCmdhlq542ROGsPplNIroLCuCbjLb9RKMCFNQXlEteq1C-rx7rvqGJ2t7mbS0wH6Lt6THWRfYJhUvu_o6-bWNUda-c4BULLt0yG4

 

 

Pattern7: IDs which only communicated to external people

IDs: [1908834,1876730,946866,1912242,905938,1247014,220095,2030871,1523762,1195510,1494252,1458789,1728248,708696,671868,

1929624,1993958,321221,2030370,439584,2056851,1945140,596672,1458915,1336870,1763672,1680161,474843,365259,215220,688489]

This kind of people likely goes to the park alone.

 

Pattern8: IDs who did not communicate at all for all three days. Determined by the ids appeared in MC1 data but did not appeared in MC2 data.

Number of IDs: 1947

Example IDs:  [1736714, 1048587, 1015826, 1605651, 1441834, 1376303, 1638453, 1469109, 393283, 1804733]

 

MC2.3From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

Limit your response to no more than 3 images and 300 words. 

Please limit your response to no more than 3 images and 300 words.

 

Response:

We observe there is a large volume of messages before 12:00pm Sunday. After closer look at the communication, starting from 11:45am, there are people in the Creighton Pavilion keeping on sending message to the outside. The overall communication volume increases too. Soon, around 11:59am, the number of people in the Creighton Pavilion was decreasing. It can be considered attendants are evacuee. We consider 11:45am some people be aware of the vandalism. Around noon, the park administrative is aware of the incidents and send out broadcast.

   Fig 3-1. High volume communication around 12:00pm Sunday.

 

说明: Snip20150708_46.png 

Fig 3-2 Starting of sending message outside from Creighton Pavillion

 

说明: Snip20150708_47.png

Fig 3-3. Evacuation. Alert sending from common communication ID.